Goto

Collaborating Authors

 endangered language


Can LLMs Help Create Grammar?: Automating Grammar Creation for Endangered Languages with In-Context Learning

Spencer, Piyapath T, Kongborrirak, Nanthipat

arXiv.org Artificial Intelligence

Yes! In the present-day documenting and preserving endangered languages, the application of Large Language Models (LLMs) presents a promising approach. This paper explores how LLMs, particularly through in-context learning, can assist in generating grammatical information for low-resource languages with limited amount of data. We takes Moklen as a case study to evaluate the efficacy of LLMs in producing coherent grammatical rules and lexical entries using only bilingual dictionaries and parallel sentences of the unknown language without building the model from scratch. Our methodology involves organising the existing linguistic data and prompting to efficiently enable to generate formal XLE grammar. Our results demonstrate that LLMs can successfully capture key grammatical structures and lexical information, although challenges such as the potential for English grammatical biases remain. This study highlights the potential of LLMs to enhance language documentation efforts, providing a cost-effective solution for generating linguistic data and contributing to the preservation of endangered languages.


Building a Language-Learning Game for Brazilian Indigenous Languages: A Case of Study

Polleti, Gustavo

arXiv.org Artificial Intelligence

We discuss in detail the challenges of building a Language learning games are key tools to vitalize language learning tool for BIL, such as the lack of endangered languages (Thomason, 2015; Xu et al., written and phonetical resources, ethical concerns 2022; Neubig et al., 2020). LARA (Akhlaghi et al., on available treebanks and databases used for exercise 2019), a multi language learning assistant, is an generation, and provide some suggestions example that has been key to support actions related on steps forward. We managed to build a minimal to endangered languages protection (Rayner proof of concept course for Guajajara language divided and Wilmoth, 2023; Bédi et al., 2022; Zuckermann in two sections. We employed dependency et al., 2021). Despite the necessity of language treebanks and a lexical database on BIL as source learning tools to vitalize endangered languages, for exercise generation. The main contribution of they are typically restricted to high-resource languages, this work is to present a case of study on building a such as english, and require significant language learning tool for BIL and, we hope, it will effort to be extended to languages with few spoken serve as an starting point for the development of and written resources.


Half of World's Languages Could Be Extinct by 2100

U.S. News

But modern tools are helping to revive Ireland's national language. An Irish proverb advises that it is often wise for one to hold his tongue. An té is ciúine is é is buaine, or "he who is silent is the stronger." But that ancestral wisdom isn't the best policy when the very language it comes from is threatened. The Irish language, Gaelic, is one of more than 40 percent of the world's 6,000 spoken languages that are endangered, according to UNESCO.